[BEAM-3525] Fix serialization of PipelineOptions in TestPipeline#run()#4478
[BEAM-3525] Fix serialization of PipelineOptions in TestPipeline#run()#4478pgerv12 wants to merge 1 commit into
Conversation
|
+reviewer: @lukecwik |
|
It seems good in general that TestPipeline tests that the serialization of pipeline options works. In particular, this may be needed if we want to pass options through the Runner API to the actual runner. |
|
Ok, then we need a way for tests to bypass this so specified options aren't dropped. |
|
In the portable future, options are allowed/expected to be serialized on their way to a runner. So that means |
|
@kennknowles Ah, ok. Thank you for the explanation! |
|
Serialization of pipeline options should only matter between SDK and runner like Kenn says and not between test infrastructure and SDK. I'm all for having a good way to do args parsing and dependency passing outside of PipelineOptions, I just don't see that being feasible while being backwards compatible. Also using two argument parsing systems is typically awkward so I can't see a migration away from using PipelineOptions, only a full swap to something else. |
lukecwik
left a comment
There was a problem hiding this comment.
Add a test so that this doesn't regress.
|
What I mean is that just using The |
|
@pgerv12 can you describe the particular problem that happened? I do think that existing runners may have a lot of dependencies on non-serializable options just for configuration. But for a non-Java SDKs the options have to come serialized, so we need to fix the runners to have all of their configuration specified in ways that can be sent that way. |
|
@kennknowles The Streams Runner has a I can see where the non-Java SDK need arises and it is problematic since this is purely Java SDK-based. A proper solution sounds like there needs to be two types of serialization: one for complete For reference, it sounds like |
|
I see. This makes a lot of sense, and should be supported until we have a complete portability story. For reference, in the portable model the Java SDK harness will execute the user's UDFs, so it would own options like this. Artifact staging goes over the "Artifact API" (just a proxy for staging via a runner-specific mechanism) and the Java construction side and Java execution side use it however they want, and just have to agree. So then For your case, now, can you just make the option not |
|
Yes, our workaround it to remove the annotations and things work well with the change. I'm glad we're having this discussion because I didn't know what the proper behavior should be. |
|
Dropping the |
|
Is it the right approach? My argument is that Edit: I understand for portability that serialization need to occur so I would say if the goal is portability then the |
|
@JsonIgnore seems like the wrong annotation to use to distinguish between
values that are used by the runner vs. the workers, especially as the
options may have to be serialized to even make it to the runner. If this
distinction is important, we could add yet another annotation, though I'm
not sure it's worth its weight (there's little harm in passing the full
configuration to the workers even if part of it is ignored).
…On Thu, Jan 25, 2018 at 6:52 AM, Paul Gerver ***@***.***> wrote:
Is it the right approach? My argument is that filesToStage should be
marked with @JsonIgnore since it's not really needed at the worker side.
If that is or isn't the case, looking at the FlinkPipelineOptions,
getStateBackend() is marked @JsonIgnore and it sounds like if I wanted to
run tests with a different backend then that wouldn't be doable from my
test parameters because of this problem (or if I used a non-Java SDK, I
could never be able to set the backend through parameters since my
arguments would be serialized away).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4478 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAdqgRoo1T-uCR5YiLXnaHALdZaVSpbBks5tOJVHgaJpZM4RreDS>
.
|
|
There are reasons why a user my actually want to know what files were staged and this could be one way for them to get it. The original reason why it was marked with |
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.This PR removes the serialization of PipelineOptions prematurely before the TestPipeline is run (which creates a runner instance, which in turn uses options to run the Beam application). Since this serialization occurs before the selected runner has the chance to use/look at them, any options marked with
JsonIgnoredo not survive -- leading to undesired behavior since serialization is really only needed between job creation/submission vs runtime.